Aims and content


The following analyses aim to describe the sleep habits of a sample of undergraduate students (N = 82) and to evaluate the role of demographical, dispositional and contextual predictors. Since data are hierarchically structured into two levels (Level 1 = night, Level 2 = student), these goals are pursued using linear mixed-effects regression (LMER) models.

Specifically, the analyses are conducted following a model comparison approach, where the predictors of interest are subsequently included in more parsimonious models, starting from a null model (with no predictors). Each model is compared to the previous one in terms of data reproducibility (likelihood ratio test, LRT) and plausibility (Aikake Information Criterion, AIC).

The analyses are performed separately for each considered sleep measure: total sleep time (TST), sleep efficiency (SE), sleep onset latency (SOL), wake after sleep onset (WASO), bedtime (BT), and wake time, (WT). The predictors of interest are included in the following order:

  • day of week: differences between weekdays and weekend are analyzed, hypothesizing shorter TST and earlier BT and WT during the weekend

  • sex: differences between females and males are analyzed, hypothesizing earlier bedtime and longer TST for females

  • circadian preferences: higher MEQ.r scores are hypothesized to predict earlier bed (BT) and wake time (WT)

  • perceived sleep disturbances: higher PSQI scores hypothesized to predict lower TST and SE, and longer SOL and WASO

  • depressive symptoms: higher BDI scores are hypothesized to predict more sleep problems, similarly to the above-mentioned hypothesis

  • students’ nap frequency: daily daytime nap as recorded with actigraphy using a set of rules modified from Patel and colleagues (2015)

  • students’ habits: participants reporting to smoke, or to drink coffé and alcohol more frequently are hypothesized to show more sleep problems, similarly to the above-mentioned hypothesis


1. Analytical pipeline


The document is structured as follows:

  • the dataset is loaded and converted from a wide form (one row per participant) to a long form (one row per night)

  • descriptives and intraclass correlation coefficients (ICCs) are computed for each sleep measure

  • a graphical inspection is performed for all sleep measures

  • the goodness of fit is assessed for each sleep measure to check the suitability of the normal and alternative distributions

  • an influential analysis is performed to assess the presence of participants influencing the parameters estimation

  • LMER models comparison is finally performed per each measure and the results of the selected model are discussed


2. Data description


The first step is to load the dataset and to transform it from a wide form (one row per participant) to a long form (one row per night) to be used with LMER models.


rm(list=ls())

library(tidyr); library(textclean); library(dplyr)

# sleep <- read.csv2("Rechecked BT WT all 26.7.19.csv")
sleep <- read.csv2("Rechecked_nap.csv")

# some numeric variables is erroneously read as factor or character
sleep[,13:ncol(sleep)] <- lapply(sleep[,13:ncol(sleep)], as.character)
sleep[,13:ncol(sleep)] <- lapply(sleep[,13:ncol(sleep)], as.numeric)
sleep$AGE <- as.numeric(as.character(sleep$AGE))
sleep$CAFFE_week <- as.numeric(as.character(sleep$CAFFE_week))
sleep$FUMO_Giorno <- as.numeric(as.character(sleep$FUMO_Giorno))
sleep$ALCOL_week <- as.numeric(as.character(sleep$ALCOL_week))

# from wide to long x each variable
sleep.long <- gather(data=sleep,day,BT,BT_LUNEDI:BT_DOMENICA)
sleep.long$WT <- gather(data=sleep,day,WT,WT_LUNEDI:WT_DOMENICA)[,117]
sleep.long$TIB <- gather(data=sleep,day,TIB,TIB_LUNEDI:TIB_DOMENICA)[,117]
sleep.long$TST <- gather(data=sleep,day,TST,TST_LUNEDI:TST_DOMENICA)[,117]
sleep.long$SOL <- gather(data=sleep,day,SOL,SOL_LUNEDI:SOL_DOMENICA)[,117]
sleep.long$WAKE <- gather(data=sleep,day,WAKE,WAKE_LUNEDI:WAKE_DOMENICA)[,117]
sleep.long$SE <- as.numeric(gather(data=sleep,day,SE,SE_LUNEDI:SE_DOMENICA)[,117])
sleep.long$WASO <- gather(data=sleep,day,WASO,WASO_LUNEDI:WASO_DOMENICA)[,117]
sleep.long$NAP <- gather(data=sleep,day,NAP,NAP_LUNEDI:NAP_DOMENICA)[,117]

# converting categorical predictors as factor
sleep.long[c("FUMO..SI.NO.","ALCOL..SI.NO.","CAFFE..SI.NO.",
             "NAP")] <- lapply(sleep.long[c("FUMO..SI.NO.","ALCOL..SI.NO.","CAFFE..SI.NO.",
                                            "NAP")], factor)

# converting two cases of NAP = 2 into NAP = 1
summary(sleep.long$NAP)
##   0   1   2 
## 485  87   2
sleep.long[sleep.long$NAP==2,"NAP"] <- 1
sleep.long$NAP <- as.factor(as.character(sleep.long$NAP))

# recoding days
sleep.long$day <- mgsub(mgsub(sleep.long$day,
                              c("BT","WT","TIB","TST","SOL","WAKE","SE","WASO","_"),
                              rep("",9)),
                        c("LUNEDI","MARTEDI","MERCOLEDI","GIOVEDI","VENERDI","SABATO","DOMENICA"),
                        c(   1,       2,          3,         4,        5,       6,        7))
sleep.long$day <- as.factor(sleep.long$day)

# order by ID and day
sleep.long <- sleep.long[order(sleep.long$ID,sleep.long$day),]
rownames(sleep.long) <- 1:nrow(sleep.long)
sleep.long[,c("ID","day","BT","WT","TIB","TST","SE","SOL","WAKE","SE","WASO")]
# sanity check
sleep.long[1,"TST"] == sleep[sleep$ID=="EM01","TST_LUNEDI"]
## [1] TRUE
# sanity check
sleep.long[sleep.long$ID=="EM04" & sleep.long$day==3,"SE"] == sleep[sleep$ID=="EM04","SE_MERCOLEDI"]
## [1] TRUE
# sanity check
sleep.long[sleep.long$ID=="MZ55" & sleep.long$day==5,"WASO"] == sleep[sleep$ID=="MZ55","WASO_VENERDI"]
## [1] TRUE
# take only columns of interest
sleep.long <- sleep.long[,c("ID","day","BT","WT","TIB","TST","SOL","WAKE","SE","WASO","NAP",
                            colnames(sleep.long)[c(5:23,30:43)])]


Comments:

  • the dataset is now in a long form.


Then, we create the new variable week.time, with value “wd” for weekdays (Sunday to Thursday) and “we” for weekend (Friday to Saturday).


sleep.long$week.time <- "wd"
sleep.long[sleep.long$day==5|sleep.long$day==6,"week.time"] <- "we"
sleep.long$week.time <- as.factor(sleep.long$week.time)
sleep.long <- sleep.long[,c("ID","day","week.time",
                            colnames(sleep.long)[3:(ncol(sleep.long)-1)])]
# sanity check
sleep.long[,c("ID","day","week.time")]


Then, we re-codify Wake Time (WT, i.e., the hours between midnight and lights on), which is currently referred to the subsequent day (e.g., Friday WT is Saturday WT, Saturday WT is Sunday WT).


# adjusting wake time (WT) - expressed as hours from 00:00 --> referred to the subsequent day!
WT <- sleep.long[,c("ID","day","WT")]
WT <- WT %>% 
  group_by(ID) %>% 
  mutate(WT.lag = dplyr::lead(WT, n = 1))
# make SUNDAY WT as that previously indicated on SATURDAY
IDs <- levels(as.factor(WT$ID))
for(ID in IDs){
  WT[WT$ID==ID & WT$day==7,"WT.lag"] <- WT[WT$ID==ID & WT$day==1,"WT"] 
}
WT # sanity check
sleep.long$WT.dayAfter <- as.numeric(WT$WT.lag)
mean(na.omit(sleep.long[sleep.long$week.time=="wd","WT.dayAfter"]))
## [1] 8.516297
mean(na.omit(sleep.long[sleep.long$week.time=="we","WT.dayAfter"]))
## [1] 9.398926


Finally, we express TST and TIB in hours, instead of minutes


sleep.long$TST <- sleep.long$TST/60
sleep.long$TIB <- sleep.long$TIB/60


2.1. Data dictionary

Here the meaning of the variables of interest is explained.


sleep.long


  • ID is the identification code of each participant

  • day is the number of the night where sleep measures are recorded in a given participant. It ranges from 1 (Monday) to 7 (Sunday)

  • week.time indicates if the considered day is a weekday (wd) or weekend (we)

  • BT is bedtime, encoded as the number of hours between the previous midnight and the time when participants started sleeping

  • WT is wake time, encoded as the number of hours between the previous midnight and the time when participants woke up

  • WT.dayAfter is WT referred to the following day

  • TIB is total time in bed, encoded as the minutes between BT and WT

  • TST is total sleep time, encoded as the minutes of effective sleep between BT and WT

  • SE is sleep efficiency, eexpressed as the percentage of sleep time over TIB (SE = 100*TST/TIB)

  • SOL is sleep onset latency, encoded as the minutes between BT and the first sleep epoch (sleep onset)

  • WASO is wake after sleep onset, encoded as the total minutes of wake after sleep onset

  • NAP indicates if a nap was taken (1) or not (0) by a given participant in a given day

  • SEX is participants sex

  • CAFFE_week is the average number of coffees consumed by a given participant each week

  • FUMO..SI.NO. indicates if a given participant is a smoker (N = 24, 29%) or not

  • ALCOL_WEEK is the average number of alcoholic units consumed by a given participant each week

  • MEQ.R is the MEQ-r score of a given participant, indicating his/her circadian preferences

  • BDI.II is the BDI-II score of a given participant, indicating his/her depressive symptoms

  • PSQI is the PSQI score of a given participant, indicating his/her preceived sleep problems


3. Descriptive statistics


Here, our goal is to describe the variables of interest with a focus on intra-individual variability. Specifically, we compute the intraclass correlation coefficients (ICCs) to estimate how the variance is distributed on both the within and the between levels. The ICCs range from 0 (all variance is within participants) to 1 (all variance is between participants).


library(ICC); library(Rmisc); library(knitr); library(dplyr); library(ggplot2); library(knitr)
windowsFonts(CMU=windowsFont("CMU Serif Roman"),time=windowsFont("Times New Roman"))


desc.table <- data.frame(measure=as.character(c("BT","WT.dayAfter","TIB","TST","SE","SOL","WASO")),
                         mean=rep(NA,7),sd=rep(NA,7),
                         mean_F=rep(NA,7),sd_F=rep(NA,7),mean_M=rep(NA,7),sd_M=rep(NA,7),
                         mean_WD=rep(NA,7),sd_WD=rep(NA,7),mean_WE=rep(NA,7),sd_WD=rep(NA,7),ICC=rep(NA,7))
desc.table$measure <- as.character(desc.table$measure)

for(i in 1:nrow(desc.table)){
  desc.table[i,2:3] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
                                              na.rm=TRUE)[3:4],2)
  desc.table[i,4:5] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
                                              groupvars="SEX",na.rm=TRUE)[1,3:4],2)
  desc.table[i,6:7] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
                                              groupvars="SEX",na.rm=TRUE)[2,3:4],2)
  desc.table[i,8:9] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
                                              groupvars="week.time",na.rm=TRUE)[1,3:4],2)
  desc.table[i,10:11] <- round(Rmisc::summarySE(sleep.long, measurevar=desc.table[i,"measure"],
                                              groupvars="week.time",na.rm=TRUE)[2,3:4],2)
  data4ICC <- sleep.long[,c("ID",desc.table[i,"measure"])]
  desc.table[i,12] <- round(ICCest(x=ID,y=desc.table[i,"measure"],data=na.omit(data4ICC))$'ICC',2)
}

kable(desc.table,digits=2, format="pandoc", caption="Table 1: Descriptive Statistics of level-1 variables")
Table 1: Descriptive Statistics of level-1 variables
measure mean sd mean_F sd_F mean_M sd_M mean_WD sd_WD mean_WE sd_WD.1 ICC
BT 25.19 1.72 24.96 1.64 25.46 1.79 24.99 1.55 25.68 2.02 0.38
WT.dayAfter 8.77 1.49 8.73 1.41 8.82 1.59 8.52 1.40 9.40 1.54 0.32
TIB 7.59 1.40 7.78 1.46 7.37 1.30 7.53 1.38 7.76 1.44 0.14
TST 6.56 1.28 6.78 1.34 6.30 1.16 6.50 1.25 6.72 1.33 0.20
SE 86.41 5.77 87.12 5.22 85.58 6.26 86.34 5.78 86.56 5.74 0.64
SOL 8.82 13.48 8.45 15.46 9.25 10.72 8.91 14.27 8.60 11.33 0.24
WASO 52.98 27.27 51.56 25.26 54.66 29.41 52.69 27.81 53.71 25.96 0.53


Comments:

  • descriptive statistics suggest that on average participants fall asleep around 01:00 a.m. and wake up around 8:45 a.m. On average, they spend 7.5 hours in bed, of which only 6.5 hours are effective sleep, with an average sleep efficiency of 86.42%. The average sleep onset latency is 9 minutes and the average wake time after sleep onset is one hour (both measures show high variability between participants).

  • descriptively, we observe no evident differences between weekdays and weekends, with the exception of TIB and TST.

  • all ICCs but those of WASO and SE are below .5, indicating that most variability in sleep measures is intra-individual variability, and thus low sleep consistency in the sample. The more stable sleep measures are WASO and SE, whereas the more varying measure, is TIB.


We also create an extended version of this table, showing means and SD by gender and week day.


desc.table2 <- data.frame(measure=rep(c("BT","WT.dayAfter","TIB","TST","SE","SOL","WASO"),each=2),
                          sex=rep(c("F","M"),7),
                          MONmean=rep(NA,14),MONsd=rep(NA,14),TUEmean=rep(NA,14),TUEsd=rep(NA,14),
                          WEDmean=rep(NA,14),WEDsd=rep(NA,14),THUmean=rep(NA,14),THUsd=rep(NA,14),
                          FRYmean=rep(NA,14),FRYsd=rep(NA,14),SATmean=rep(NA,14),SATsd=rep(NA,14),
                          SUNmean=rep(NA,14),SUNsd=rep(NA,14))

for(i in 1:nrow(desc.table2)){
  stats <- Rmisc::summarySE(sleep.long,measurevar=as.character(desc.table2[i,"measure"]),
                            groupvars=c("SEX","day"),na.rm=TRUE)
  desc.table2[i,c("MONmean","MONsd")] <- round(stats[stats$day==1&stats$SEX==desc.table2[i,"sex"],4:5],2)
  desc.table2[i,c("TUEmean","TUEsd")] <- round(stats[stats$day==2&stats$SEX==desc.table2[i,"sex"],4:5],2)
  desc.table2[i,c("WEDmean","WEDsd")] <- round(stats[stats$day==3&stats$SEX==desc.table2[i,"sex"],4:5],2)
  desc.table2[i,c("THUmean","THUsd")] <- round(stats[stats$day==4&stats$SEX==desc.table2[i,"sex"],4:5],2)
  desc.table2[i,c("FRYmean","FRYsd")] <- round(stats[stats$day==5&stats$SEX==desc.table2[i,"sex"],4:5],2)
  desc.table2[i,c("SATmean","SATsd")] <- round(stats[stats$day==6&stats$SEX==desc.table2[i,"sex"],4:5],2)
  desc.table2[i,c("SUNmean","SUNsd")] <- round(stats[stats$day==7&stats$SEX==desc.table2[i,"sex"],4:5],2)
}

kable(desc.table2,digits=2, format="pandoc", caption="Table 1: Descriptive Statistics by gender and day")
Table 1: Descriptive Statistics by gender and day
measure sex MONmean MONsd TUEmean TUEsd WEDmean WEDsd THUmean THUsd FRYmean FRYsd SATmean SATsd SUNmean SUNsd
BT F 24.70 1.42 24.66 1.06 24.85 1.44 24.92 1.51 25.19 1.58 25.75 2.42 24.63 1.52
BT M 25.09 1.47 25.16 1.41 25.37 1.74 25.16 1.36 25.81 2.11 26.06 1.82 25.55 2.26
WT.dayAfter F 8.28 1.15 8.11 1.09 8.68 1.25 8.86 1.60 9.11 1.51 9.47 1.43 8.60 1.34
WT.dayAfter M 8.49 1.28 8.17 1.59 8.50 1.44 8.49 1.30 9.34 1.82 9.72 1.37 8.97 1.79
TIB F 7.48 1.56 7.51 1.31 7.93 1.17 7.76 1.73 8.01 1.40 7.88 1.53 7.91 1.43
TIB M 7.38 1.24 7.03 1.13 7.10 1.18 7.38 1.35 7.47 1.25 7.63 1.55 7.60 1.34
TST F 6.54 1.49 6.53 1.16 6.86 1.01 6.78 1.54 7.02 1.42 6.85 1.33 6.92 1.33
TST M 6.23 1.10 6.00 1.01 6.10 0.91 6.30 1.16 6.46 1.15 6.50 1.35 6.49 1.34
SE F 87.14 5.22 87.06 5.21 86.58 4.52 87.28 5.05 87.29 7.00 87.08 4.56 87.39 4.89
SE M 84.67 6.55 85.54 6.89 86.44 6.00 85.54 5.73 86.45 5.51 85.25 5.58 85.15 7.57
SOL F 9.05 12.18 7.98 10.48 9.41 19.69 10.14 27.77 8.16 10.53 7.66 10.23 6.80 8.14
SOL M 8.78 8.75 8.08 7.78 7.97 9.17 10.16 11.39 8.53 14.03 10.29 10.67 10.89 12.23
WASO F 47.91 21.91 50.91 24.70 54.95 23.43 48.95 29.15 50.98 24.33 54.16 28.41 53.02 24.99
WASO M 59.81 33.55 53.76 32.51 51.78 26.64 54.37 27.03 52.55 25.30 57.50 26.05 52.87 34.80


Finally, we can take a look at the variables on level 2 (between-individuals):


gridExtra::grid.arrange(ggplot(sleep,aes(SEX))+geom_bar()+ggtitle("SEX"),
                        ggplot(sleep,aes(round(AGE,0)))+geom_bar()+ggtitle("AGE"),
                        ggplot(sleep,aes(FUMO..SI.NO.))+geom_bar()+ggtitle("SMOKE"),
                        ggplot(sleep,aes(ALCOL..SI.NO.))+geom_bar()+ggtitle("ALCOHOL"))

desc2 <- rbind(
  sleep[sleep$CAFFE..SI.NO.==1,] %>% 
  dplyr::select(CAFFE_week) %>%
  gather("Variable", "value") %>% 
  group_by(Variable) %>%
  summarise(Mean=round(mean(value, na.rm=TRUE),2),
            SD=round(sd(value, na.rm=TRUE),2), 
            min=round(min(value, na.rm=TRUE), 2),
            max=round(max(value, na.rm=TRUE),2),
            N=length(which(!is.na(value)))),
  sleep[sleep$FUMO..SI.NO.==1,] %>% 
  dplyr::select(FUMO_Giorno) %>%
  gather("Variable", "value") %>% 
  group_by(Variable) %>%
  summarise(Mean=round(mean(value, na.rm=TRUE),2),
            SD=round(sd(value, na.rm=TRUE),2), 
            min=round(min(value, na.rm=TRUE), 2),
            max=round(max(value, na.rm=TRUE),2),
            N=length(which(!is.na(value)))),
  sleep[sleep$ALCOL..SI.NO.==1,] %>% 
  dplyr::select(ALCOL_week) %>%
  gather("Variable", "value") %>% 
  group_by(Variable) %>%
  summarise(Mean=round(mean(value, na.rm=TRUE),2),
            SD=round(sd(value, na.rm=TRUE),2), 
            min=round(min(value, na.rm=TRUE), 2),
            max=round(max(value, na.rm=TRUE),2),
            N=length(which(!is.na(value)))),
  sleep %>% 
  dplyr::select(c(MEQ.r:PSQI)) %>%
  gather("Variable", "value") %>% 
  group_by(Variable) %>%
  summarise(Mean=round(mean(value, na.rm=TRUE),2),
            SD=round(sd(value, na.rm=TRUE),2), 
            min=round(min(value, na.rm=TRUE), 2),
            max=round(max(value, na.rm=TRUE),2),
            N=length(which(!is.na(value)))))
  
kable(desc2,digits=2, format="pandoc", caption="Table 2: Descriptive Statistics of level-2 variables")
Table 2: Descriptive Statistics of level-2 variables
Mean SD min max N
12.43 7.46 1.0 35 66
6.54 5.33 1.0 20 24
4.08 3.65 0.5 15 72
9.84 5.92 0.0 38 246
gridExtra::grid.arrange(ggplot(sleep,aes(BDI.II))+geom_bar()+ggtitle("BDI-II"),
                        ggplot(sleep,aes(round(PSQI,0)))+geom_bar()+ggtitle("PSQI"),
                        ggplot(sleep,aes(MEQ.r))+geom_bar()+ggtitle("MEQ.r"),nrow=3)

ggplot(sleep,aes(x=PSQI,fill=SEX))+
  geom_histogram(color="white",position="identity",binwidth = 1)+
  geom_vline(xintercept=5.5,lty=2,lwd=1)+
  scale_fill_manual(name="Gender",
                     values=c("lightgray",rgb(.08, .08, .08,alpha=.5)),
                     labels=c("Women","Men"))+
  theme_classic()+labs(x="PSQI scores",y="Frequency")+
  theme(text=element_text(size=20,face="bold",family="time"))


Comments:

  • the sample is gender-balanced, but it is mainly composed by non-smokers (N = 58, 70%) and alcohol users (N = 72, 88%).

  • the sample is also homogeneous in terms of age, with the exception of one 35-year-old participant (could be an influential case!)

  • on average, participants consume 12 coffees per week, with smokers reporting to smoke an average number of 6.5 cigarettes per day and alcohol users reporting to drink an average number of 4 alcoholic units per week

  • the average BDI-II score is 10.22, which is below the suggested cutoff of 13. Only 22 participants (27%) show a score higher than 13.

  • the average MEQ-r score is 13.23, suggesting that the average participant reports an “intermediate” circadian preference, with 6 (7%) “early birds” (preferring earlier bed and wake time) and 22 (27%) “night owls” (preferring later bed and wake time)

  • the average PSQI score is 6.06, which is slightly above the suggested cut-off of 5, suggesting the presence of a relatively high number of participants reporting sleep disturbances. Indeed, 44 participants (54%) have a PSQI score higher than 5


4. Graphics


Here, the variables’ distributions are further examined by a graphical inspection. A first data is related to sleep consistency and it can be shown in terms of the standard deviation of each considered sleep metric.


library(ggplot2); library(extrafont); library(cowplot); library(Rmisc)
# function for geom_flat_violin
source("https://gist.githubusercontent.com/benmarwick/2a1bb0133ff568cbe28d/raw/fb53bd97121f7f9ce947837ef1a4c65a73bffb3f/geom_flat_violin.R")


# metrics standard deviations
gridExtra::grid.arrange(ggplot(sleep,aes(x=WT_SD))+
                          geom_histogram(bins=40) + ggtitle("Wake Time") + xlab("St. Dev") +
                          theme(text=element_text(family="CMU")),
                        ggplot(sleep,aes(x=BT_SD))+
                          geom_histogram(bins=40) + ggtitle("Bed Time") + xlab("St. Dev") +
                          theme(text=element_text(family="CMU")),
                        ggplot(sleep,aes(x=TIB_SD))+
                          geom_histogram(bins=40) + ggtitle("Time in Bed") + xlab("St. Dev")+
                          theme(text=element_text(family="CMU")),
                        ggplot(sleep,aes(x=TST_SD))+
                          geom_histogram(bins=40) + ggtitle("Total Sleep Time") + xlab("St. Dev")+
                          theme(text=element_text(family="CMU")),
                        ggplot(sleep,aes(x=SOL_SD))+
                          geom_histogram(bins=40) + ggtitle("Sleep Onset Time") + xlab("St. Dev")+
                          theme(text=element_text(family="CMU")),
                        ggplot(sleep,aes(x=SE_SD))+
                          geom_histogram(bins=40) + ggtitle("Sleep Efficiency") + xlab("St. Dev")+
                          theme(text=element_text(family="CMU")),
                        ggplot(sleep,aes(x=WASO_SD))+
                          geom_histogram(bins=40) + ggtitle("Wake After Sleep Onset") + 
                          xlab("St. Dev")+ theme(text=element_text(family="CMU")),nrow=2)


Comments:

  • we can observe a degree of variability in each variable, with most cases ranging between 50 and 70 minutes for sleep and wake time, time in bed and total sleep time, and between 10 and 20 minutes for sleep onset latency and wake after sleep onset.

  • the most consistent variable is sleep efficiency, which shows an average SD around 3%.


Then, we can take a look at the day-by-day variability within each participant. Below, each variable is plotted day-by-day for each participant and considering the whole sample.


sleep.long$day <- as.numeric(sleep.long$day)

# TIB AND TST by participant
ggplot(sleep.long,aes(x=day,y=TIB)) +
  geom_smooth(se=FALSE,span=0.7,colour="black") +
  geom_point(size=1,colour="black") +
  geom_smooth(aes(y=TST),colour="#99c2ff",se=FALSE,span=0.7) + # insoddisfatto
  geom_point(aes(y=TST),size=1,colour="#99c2ff") +
  geom_vline(aes(xintercept=5.5))+
  facet_wrap("ID",strip.position="right") +
  ggtitle("Time in bed (black) and Total Sleep Time (blue)")+ylab("TST (min)")+
  theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
  scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))

# TIB AND TST distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=TST,fill=as.factor(as.character(day))))+
  geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
  geom_flat_violin(aes(y=TIB),fill="gray",position = position_nudge(x = .2, y = 0),adjust =2,size=1.5,alpha=.5)+
  geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
  geom_boxplot(aes(x = as.factor(as.character(day)), y = TST, fill = as.factor(as.character(day))),size=1.5,
               outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
  labs(x="",y="TST (min)")+ggtitle("Total Sleep Time and Time in bed (in gray)")+
  scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
  theme(text =  element_text(family = "CMU",size=18,face="bold"),legend.position = "none")

# SE by participants
ggplot(sleep.long,aes(x=day,y=SE)) +
  geom_smooth(se=FALSE,span=0.7,colour="black") +
  geom_point(size=1,colour="black") +
  geom_vline(aes(xintercept=5.5))+
  facet_wrap("ID",strip.position="right") +
  ggtitle("Sleep Efficiency")+ylab("SE (%)")+
  theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
  scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))

# SE distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=SE,fill=as.factor(as.character(day))))+
  geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
  geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
  geom_boxplot(aes(x = as.factor(as.character(day)), y = SE, fill = as.factor(as.character(day))),size=1.5,
               outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
  labs(x="",y="SE (%)")+ggtitle("Sleep Efficiency")+
  scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
  theme(text =  element_text(family = "CMU",size=18,face="bold"),legend.position = "none")

# WASO and SOL by participant
ggplot(sleep.long,aes(x=day,y=WASO)) +
  geom_smooth(se=FALSE,span=0.7,colour="darkred") +
  geom_point(size=1,colour="darkred") +
  geom_smooth(aes(y=SOL),se=FALSE,span=0.7,colour="salmon") +
  geom_point(aes(y=SOL),size=1,colour="salmon") +
  geom_vline(aes(xintercept=5.5))+
  facet_wrap("ID",strip.position="right") +
  ggtitle("Wake After Sleep Onset (dark red) and Sleep Onset Latency (light red)")+ylab("WASO and SOL (min)")+
  theme(axis.text = element_text(size=5),text=element_text(family="CMU"))+
  scale_x_continuous(breaks = c(2,4,6),labels=c("TUE","THU","SAT"))

# WASO distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=WASO,fill=as.factor(as.character(day))))+
  geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
  geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
  geom_boxplot(aes(x = as.factor(as.character(day)), y = WASO, fill = as.factor(as.character(day))),size=1.5,
               outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
  labs(x="",y="WASO (min)")+ggtitle("Wake After Sleep Onset")+
  scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
  theme(text =  element_text(family = "CMU",size=18,face="bold"),legend.position = "none")

# SOL distributions
ggplot(sleep.long[!is.na(sleep.long$SOL)&sleep.long$SOL<100,],aes(x=as.factor(as.character(day)),y=SOL,fill=as.factor(as.character(day))))+
  geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
  geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
  geom_boxplot(aes(x = as.factor(as.character(day)), y = SOL, fill = as.factor(as.character(day))),size=1.5,
               outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
  labs(x="",y="SOL (min)")+ggtitle("Sleep Onset Latency")+
  scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
  theme(text =  element_text(family = "CMU",size=18,face="bold"),legend.position = "none")

# WT distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=WT,fill=as.factor(as.character(day))))+
  geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
  geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
  geom_boxplot(aes(x = as.factor(as.character(day)), y = WT, fill = as.factor(as.character(day))),size=1.5,
               outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
  labs(x="",y="")+ggtitle("Wake time")+
  scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
  scale_y_continuous(breaks=c(4,8,12,16),labels=c("04:00","08:00","12:00","16:00"))+
  theme(text =  element_text(family = "CMU",size=18,face="bold"),legend.position = "none")

# BT distributions
ggplot(sleep.long,aes(x=as.factor(as.character(day)),y=BT-24,fill=as.factor(as.character(day))))+
  geom_flat_violin(position = position_nudge(x = .2, y = 0),adjust =2,size=1.5)+
  geom_point(aes(col=as.factor(as.character(day))),position = position_jitter(width = .15),size=2)+
  geom_boxplot(aes(x = as.factor(as.character(day)), y = BT-24, fill = as.factor(as.character(day))),size=1.5,
               outlier.shape = NA, alpha = .5, width = .1, colour = "black")+
  labs(x="",y="")+ggtitle("Bed time")+
  scale_x_discrete(labels=c("MON","TUE","WED","THU","FRY","SAT","SUN"))+
  scale_y_continuous(breaks=c(0,5,10),labels=c("00:00","05:00","10:00"))+
  theme(text =  element_text(family = "CMU",size=18,face="bold"),legend.position = "none")

# mean values (WT.dayAfter is wake time on the day after)
sleep.long$day <- factor(sleep.long$day,levels=c(7,6,5,4,3,2,1))
means <- summarySE(sleep.long,measurevar="WT.dayAfter",groupvars="day",na.rm=TRUE)
means$BT <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,3]
means$BT.sd <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,4]
means$BT.se <- summarySE(sleep.long,measurevar="BT",groupvars="day",na.rm=TRUE)[,5]
means <- means[order(means$day,decreasing=TRUE),]

# plotting BT and following WT
ggplot(sleep.long,aes(x=day,y=BT-24))+
  geom_flat_violin(adjust =2,size=0,alpha=.5,fill="gray")+
  geom_flat_violin(aes(y=WT.dayAfter),
                   adjust =2,size=0,alpha=.5,fill="gray")+
  # geom_boxplot(outlier.shape = NA,width = .1,fill="gray",
  #              position = position_nudge(x = -.2, y = 0))+
  # geom_boxplot(aes(y=WT2),outlier.shape = NA,width = .1,fill="gray",
  #              position = position_nudge(x = -.2, y = 0))+
  geom_segment(data=means,aes(x=day,xend=day,
                              y=WT.dayAfter,yend=BT-24),size=2,colour="black")+
  geom_point(data=means,aes(x=day,y=WT.dayAfter),size=6,shape=21,fill="gray")+
  geom_point(data=means,aes(x=day,y=BT-24),size=6,shape=22,fill="gray")+
  labs(x="",y="")+
  scale_x_discrete(labels=c("SUN","SAT","FRY","THU","WED","TUE","MON"))+
  scale_y_continuous(breaks=c(0,2.5,5,7.5,10,12.5),limits=c(-1,11),
                     labels=c("00:00","02:30","05:00","07:30","10:00","12:30"))+
  theme_classic()+
  theme(text =  element_text(family = "time",face="bold",size=18),legend.position = "top")+
  coord_flip()


Comments:

  • We observe no clear differences between weekdays and weekend for TIB, TST, SE, WASO, and SOL.

  • In contrast, slightly higher WT and BT values are showed on Saturday compared to other days, suggesting that participants fall asleep later and wake up later on Saturday.

  • We can notice one outlier (EM39) for WT on day 4 (Thursday).

  • The latter graph suggest that both Wednesday and Saturday show slightly higher variability compared to other days.


5. Goodness of fit


library(lme4); library(jtools); library(effects) # modeling
library(fitdistrplus) ; library(car) ; library(lattice) # gof


In this section, we assess the goodness of fit (GOF) of each model specified in the “LMER MODELS” section. Specifically, we focus on the most complex model speciefied per each sleep measure, and we evaluate (A) the normality of resuduals distribution, (B) the normality of random effects distribution, and (C) the independence of residuals from fitted values (homoscedasticity).


PIPELINE


For each sleep metric, we assess the GOF of the most complex model (including all predictors of interest) and we assess the following LMER models’ assumptions:

  • independence between residuals and fitted values,

  • the normality of both residuals’ and random effects’ distribution,

  • that random effects are centered on zero,


TST


Typically, TST shows a symmetrical shape that can be modeled with a normal distribution (as shown in the GRAPHICS section).


m <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))

# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))

# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
                        qqmath(residuals(m),main="Residual QQPlot"),
                        dotplot(ranef(m))[[1]],
                        plot(ranef(m),main="Random Effects QQplot")[[1]],
                        nrow=2)


Comments:

  • The normal distribution shows an excellent fit on TST’s residuals and a good fit on random effects.

  • All other LMER assumptions seem to be fulfilled.


Conclusions:

We can model TST under the normality assumption.


SE


SE is bounded between 0 (when TST = 0) and 100 (when TST = TIB). Thus, a beta regression could be the best option, but at the time of these analyses no packages able to deal with mixed-effects beta regression were available (to our knowledge).

In our sample, only 4 on 82 showed an SE lower than 70%: EM07 (night 5), EM22 (night 36), EM36 (night 7) and EM53 (all 7 nights), whereas only one participant showed an SE = 99% and no participants showed a 100% SE.


# SE < 70%
sleep.long[!is.na(sleep.long$SE)&sleep.long$SE<70,
           c("ID","day","SE","TIB","TST","WASO","SOL")]
mean(sleep.long[sleep.long$ID=="MZ53","SE"])
## [1] 61.77
mean(sleep.long[sleep.long$ID=="MZ53","WASO"])
## [1] 148.2857
# SE = 99%
sleep.long[!is.na(sleep.long$SE)&sleep.long$SE==99,
           c("ID","day","SE","TIB","TST","WASO","SOL")]


The frequency distribution shows a quite normal shape, which is symmetric and centered approximately on 87-88%. The issue of modeling these data using a normal distribution is that the models would predict values above 100%, whereas TST cannot be higher than TIB (thus, SE cannot be higher than 100%).


hist(sleep.long$SE,breaks=100)


With no clues on how to model the data differently, we accept this limitation and we proceed by evaluating the fit of the normal distribution. Indeed, SE can be seen as a sort of transformation of TST, which can be reasonably modeled assuming residuals normality. SE results should confirm those obtained on TST, and we could focus our goal on assessing the impact of our predictors of interest on SE, instead of predicting meaningful SE values.


m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))

# random effects normal fit (acceptable)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))

# LMER assumptions (acceptable)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
                        qqmath(residuals(m),main="Residual QQPlot"),
                        dotplot(ranef(m))[[1]],
                        plot(ranef(m),main="Random Effects QQplot")[[1]],
                        nrow=2)


Comments:

  • The normal distribution does not show a very good fit on SE’s residuals or random effects. This is especially due to the four outliers (EM07, EM22, EM36 and MZ53) showing SE lower than 70% (2+ hours of WASO, 2 to 5 hours of TST).

  • All other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero).


Let’s see what happens to residuals normality if we exclude the two outliers:


m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
          sleep.long[sleep.long$ID!="EM07"&sleep.long$ID!="EM22"&sleep.long$ID!="EM36"&sleep.long$ID!="MZ53",]) # excluding outliers
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))


Comments:

  • the normal fit is much better without participants with SE <70%, and especially without EM07 and MZ53 (SE < 60%).


We can also try to log-transform the data considering the whole sample.


m <- lmer(log(SE)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data=sleep.long)
# residuals normal fit (worse)
plot(fitdist(resid(m),distr="norm",method="mle"))


Comments:

  • if we do not exclude the outliers, the log-transformation does not improve the fit of the normal distribution.


Conclusions:

Predicting SE scores assuming a normal residuals distribution implies that SE could be above 100% or (with much less probability) below 0%, which cannot be the case. However, with no clues on alternative ways to model our data, the normal distribution seems to be the better way. Looking at the data, we evaluate the normal fit on SE residuals as not excellent but at least acceptable. Removing four outliers would improve the fit.


SOL


SOL is a variable bounded on zero, typically skewed on the right, with few cases showing more than 1 hour of latency. Thus, we can try to log transform it to fit a model under normality, or we might use the Gamma family, which is bounded on zero.


# normal fit on log-transformed SOL (a constant is added to avoid cases of log(0))
m <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable?)
plot(fitdist(resid(m),distr="norm",method="mle"))

# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))

# LMER assumptions (acceptable)
gridExtra::grid.arrange(plot(m,main="Res. vs Fitted"),
                        qqmath(residuals(m),main="Residual QQPlot"),
                        dotplot(ranef(m))[[1]],
                        plot(ranef(m),main="Random Effects QQplot")[[1]],
                        nrow=2)


Comments:

  • the fit of the normal distribution is not very good but acceptable

  • all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero)


Then, we check the fit of the Gamma family.


m <- glmer(SOL+.05~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long,
           family="Gamma")
plot(fitdist(resid(m),distr="norm",method="mle"))


Conclusions:

The fit of the Gamma family on raw SOL data is worse than that of the normal distribution on log-transformed data. The best option seems to log-transform.


WASO


As TST, WASO usually shows a symmetrical shape that can be modeled with a normal distribution (as shown in the GRAPHICS section).


m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))

# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))

# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
                        qqmath(residuals(m),main="Residual QQPlot"),
                        dotplot(ranef(m))[[1]],
                        plot(ranef(m),main="Random Effects QQplot")[[1]],
                        nrow=2)


Comments:

  • the normal distribution shows a good fit on WASO’s residuals and a good fit on random effects

  • all other LMER assumptions seem to be fulfilled


Conclusions:

We can model WASO under the normality assumption.


BT


We expect to observe a symmetric shape also for BT.


m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (acceptable)
plot(fitdist(resid(m),distr="norm",method="mle"))

# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))

# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
                        qqmath(residuals(m),main="Residual QQPlot"),
                        dotplot(ranef(m))[[1]],
                        plot(ranef(m),main="Random Effects QQplot")[[1]],
                        nrow=2)

# outliers
sleep.long[!is.na(sleep.long$BT)&sleep.long$BT>30,c("ID","day","BT","WT")]


Comments:

  • the fit of the normal distribution is not very good but acceptable

  • we can observe some outliers (especially EM22 and MZ36) showed BT higher than 34

  • all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are normally distributed and centered on zero)


Let’s see what happens if we exclude the two outliers:


m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
          sleep.long[sleep.long$ID!="EM22"&sleep.long$ID!="MZ36",]) # excluding outliers
# residuals normal fit (slighlty better)
plot(fitdist(resid(m),distr="norm",method="mle"))


Comments:

  • the normal fit is slighlty better without participants EM22 and MZ36.


Also a log transformation slighlty improves the normal fit, but the improvement is not substantial.


m <- lmer(log(BT)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (slighlty better)
plot(fitdist(resid(m),distr="norm",method="mle"))


Conclusions:

The normal fit on BT is not excellent but at least acceptable, removing two outliers seems to improve the fit. There is no evident reason to log transform the data.


WT


We expect to observe a symmetric shape also for WT.


m <- lmer(WT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)
# residuals normal fit (good)
plot(fitdist(resid(m),distr="norm",method="mle"))

# random effects normal fit (good)
plot(fitdist(ranef(m)$'ID'[[1]],distr="norm",method="mle"))

# LMER assumptions (good)
gridExtra::grid.arrange(plot(m,main="Resid vs Fitted"),
                        qqmath(residuals(m),main="Residual QQPlot"),
                        dotplot(ranef(m))[[1]],
                        plot(ranef(m),main="Random Effects QQplot")[[1]],
                        nrow=2)


Comments:

  • the normal distribution fits well both residuals and random effects for WT

  • all other LMER assumptions seem to be fulfilled (residuals are independent of fitted values, and random effects are centered on zero)


Conclusions:

We can model WT under the normality assumption.


6. Influential analysis


Influential cases are cases that strongly determine the size of the parameters estimated by a model, negatively influencing the statistical fit and generalizability of the model. Influential analysis is based on the principle that the iterative exclusion of single cases from the data considered by a target model should not produce substantially different parameters estimates (Nieuwenhuis, Te Grotenhuis & Pelzer, 2012).

In this section, we perform an influential analysis by using the Cook’s distance (CD) (that indicates, for each observation i, the distance between the parameters vector of the model M and that of M-i. High CD values indicate that the observation has a strong influence on the parameters estimated by the model. A cut-off of 4/N can be used to identify the most influential cases based on the CD.


PIPELINE


For each sleep measure, we use the most complex model (with all predictors of interest) to proceed as follows:

  • we compute the Cook’s distance by considering a cut-off set to 4/N,

  • we re-estimate the parameters by removing all potentially influent cases, one-by-one,

  • if a given case shows to substantially influence the parameters estimation, we replicate the influential analysis without that case,

  • finally, we decide which cases can be removed,


TST


# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$TST)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
                           !is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                           !is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
                           !is.na(sleep.long$CAFFE_week),
                         c("ID","day","TST","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
                            "ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)

# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N

# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
  if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm

# Plotting Cook's distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
     main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
                " ( NInfl =",nrow(pm[pm$infl==1,]),")"),
     pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
       pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
     pm[pm$cd>=cutoff,"cd"]-0.005,
     paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
       lwd=1.8,col="blue",lty=2)

# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
##  [1] "EM06" "EM07" "EM09" "EM22" "EM23" "EM36" "EM42" "em51" "EM53" "MZ04"
## [11] "MZ08" "MZ22" "MZ25" "MZ34" "MZ39" "MZ40" "MZ44" "MZ46" "MZ50" "MZ55"
## [21] "MZ56"


Comments:

  • 30 observations (i.e., combination of participant and day) due to 21 participants are more extreme than others and show a CD above the cutoff


In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.


# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(PM[PM$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- PM[PM$ID!=ID,]
  m4 <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=12|parameters$est>=14),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=12|parameters$est>=14),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)


Comments:

  • Larger differences are shown by the parameters estimated for week.time (ranging from 12 to 16, with t-values ranging from 2.0 to 2.4), MEQ-R (ranging from -1.25 to 0.00, with t-values ranging from -1.00 to 0.00, and MZ44 and MZ56 leading to an average decrease of .4 in the estimated parameter), ALCOHOL (ranging from -1.6 to -.2, with t-values ranging from -1.0 to 0.0, and MZ44 leading to an absolute increase of .75 in the estimated parameter), SMOKE (ranging from -15 to -8, with t-values ranging from -1.4 to -.8, and EM06, EM22 and EM36 leading to an average absolute increase of .2 in the estimated parameter).

  • EM22 and EM36 are among the most influential cases. Interestingly, they are also two of the four outliers found for SE.

  • None of the potentially influential cases seems to substantially change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).


Conclusions:

We decide to keep all participants in TST analyses.


SE


# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$SE)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
                           !is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                           !is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
                           !is.na(sleep.long$CAFFE_week),
                         c("ID","day","SE","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
                            "ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)

# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N

# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
  if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm

# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
     main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
                " ( NInfl =",nrow(pm[pm$infl==1,]),")"),
     pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
       pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
     pm[pm$cd>=cutoff,"cd"]-0.005,
     paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
       lwd=1.8,col="blue",lty=2)

# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
##  [1] "EM02" "EM04" "EM07" "EM08" "EM10" "EM12" "EM16" "EM22" "EM23" "EM29"
## [11] "EM36" "EM39" "EM54" "MZ03" "MZ04" "MZ05" "MZ09" "MZ15" "MZ21" "MZ25"
## [21] "MZ30" "MZ35" "MZ36" "MZ38" "MZ53" "MZ56"


Comments:

  • 39 observations (i.e., combination of participant and day) due to 26 participants are more extreme than others and show a CD above the cutoff


In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects


# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- data2check[data2check$ID!=ID,]
  m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=.1|parameters$est>=.3),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="week.timewe" & (parameters$est<=.1|parameters$est>=.3),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)


Comments:

  • Larger differences are shown by the parameter estimated for NAP (ranging from -.6 to -1.1, with t-values ranging from -1.6 to -2.3, and EM07 showing the strongest influence, leading to an increase of .4 in the estimated NAP coefficient). Further differences are shown by the parameters estimated for sex (ranging from -2.0 to -1.0, with t-values ranging from -1.6 to -1.0, and MZ53 leading to an absolute increase of .75 in the estimated parameter), and smoke (ranging from -3 to -1.5, with t-values ranging from -2.5 to -1.5, and MZ53 leading to an absolute increase of 1.0 in the estimated parameter).

  • MZ53 shows a degree of influence on almost all parameters, although the interpretation of the results does not change substantially (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative). The only exception is smoke, whose effect seems to strongly depend on MZ53.

  • MZ53 is also one of the four outliers found for SE, with all nights showing an SE < .70.


Thus, we replicate the influential analysis by removing EM07 and MZ53. We start with the former.


OUT <- data2check[data2check$ID!="EM07",] # excluding influential case
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)

# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- OUT[OUT$ID!=ID,]
  m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters



Comments:

  • The exclusion of EM07 decreases the effect of NAP and makes the estimated parameter more consistent.

  • Participant MZ53 continues to show a degree of influence on many parameters, and particularly on smoke.


Thus, we replicate the influential analysis by removing MZ53.


OUT <- OUT[OUT$ID!="MZ53",] # excluding influential case
m <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)

# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- OUT[OUT$ID!=ID,]
  m4 <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters



Comments:

  • The exclusion of MZ53 decreases the effect of smoke and makes the estimated parameter more consistent. None of the other parameters show to be substantially affected by the removal of this participant.

  • We do not observe other potentially influential cases.


Conclusions:

We decide to perform the analyses on SE without EM07 and MZ53.


SOL


# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$SOL)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
                           !is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                           !is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
                           !is.na(sleep.long$CAFFE_week),
                         c("ID","day","SOL","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
                            "ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),sleep.long)

# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N

# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
  if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm

# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
     main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
                " ( NInfl =",nrow(pm[pm$infl==1,]),")"),
     pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
       pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
     pm[pm$cd>=cutoff,"cd"]-0.005,
     paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
       lwd=1.8,col="blue",lty=2)

# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
##  [1] "EM04" "EM05" "EM08" "EM09" "EM12" "EM18" "EM23" "EM24" "EM26" "EM44"
## [11] "EM53" "EM54" "MZ02" "MZ04" "MZ05" "MZ14" "MZ15" "MZ21" "MZ22" "MZ23"
## [21] "MZ31" "MZ35" "MZ36" "MZ39" "MZ47" "MZ49" "MZ50" "MZ51" "MZ55" "MZ57"


Comments:

  • 38 observations (i.e., combination of participant and day) due to 30 participants are more extreme than others and show a CD above the cutoff


In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.


# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- data2check[data2check$ID!=ID,]
  m4 <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)

  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est<=(-.1),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est<=(-.1),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)


Comments:

  • None of the estimated parameters seems to be substantially affected by potentially influential cases.

  • Two participants show stronger influence on several parameters: MZ15 for sex, MEQ-r, PSQI, and BDI; MZ23 for alcohol and coffee. However, also these participants do not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).


Conclusions:

We decide to keep all participants in SOL analyses.


WASO


# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$WASO)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
                           !is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                           !is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
                           !is.na(sleep.long$CAFFE_week),
                         c("ID","day","WASO","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
                            "ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)

# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N

# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
  if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm

# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
     main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
                " ( NInfl =",nrow(pm[pm$infl==1,]),")"),
     pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
       pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
     pm[pm$cd>=cutoff,"cd"]-0.005,
     paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
       lwd=1.8,col="blue",lty=2)

# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
##  [1] "EM04" "EM07" "EM09" "EM10" "EM12" "EM17" "EM22" "EM29" "EM38" "EM39"
## [11] "EM50" "EM53" "EM54" "EM55" "MZ03" "MZ16" "MZ17" "MZ25" "MZ30" "MZ34"
## [21] "MZ36" "MZ42" "MZ46" "MZ49" "MZ53" "MZ56"


Comments:

  • 44 observations (i.e., combination of participant and day) due to 26 participants are more extreme than others and show a CD above the cutoff


In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.


# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(PM[PM$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- PM[PM$ID!=ID,]
  m4 <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est>=1.5,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="week.timewe" & parameters$est>=1.5,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)



Comments:

  • Larger differences are shown by the parameter estimated for smoke (ranging from 7 to 13, with t-values ranging from 1.4 to 2.2), whose effect is strongly influenced by MZ53 (leading to an increase of 4 in the estimated smoke coefficient).

  • MZ53 is the participant influencing more coefficients (week.time, sex, MEQ-r, PSQI, alcohol, and smoke), followed by EM54 (affecting the parameter for alcohol), MZ16 (influencing the effect of coffee) and MZ25 (influencing the effect of BDI-II). MZ53 is the most influential case (this participant is also one of the outliers for SE).


Thus, we replicate the influential analysis by removing MZ53.


OUT <- data2check[data2check$ID!="MZ53",] # excluding influential case
m <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),OUT)

# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(OUT[OUT$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- OUT[OUT$ID!=ID,]
  m4 <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters



Comments:

  • The exclusion of MZ53 decreases and makes more consistent the parameter estimated for smoke. Slight changes are observable also for other parameters.

  • There are still some participants showing a degree of influence on the estimation of the parameters, especially MZ25 (affecting sex and PSQI), MZ25 (affecting PSQI, BDI-II, and smoke), and MZ36 (affecting sex, alcohol, and smoke). However, the exclusion of these participant does not substantially change the estimated parameters.


Conclusions:

We decide to exclude only **MZ53* from WASo analyses.


BT


# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$BT)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
                           !is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                           !is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
                           !is.na(sleep.long$CAFFE_week),
                         c("ID","day","BT","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
                            "ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)

# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N

# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
  if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm

# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
     main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
                " ( NInfl =",nrow(pm[pm$infl==1,]),")"),
     pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
       pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
     pm[pm$cd>=cutoff,"cd"]-0.005,
     paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
       lwd=1.8,col="blue",lty=2)

# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
##  [1] "EM07" "EM08" "EM09" "EM22" "EM23" "EM29" "EM36" "EM40" "EM42" "EM45"
## [11] "MZ03" "MZ04" "MZ17" "MZ22" "MZ25" "MZ34" "MZ40" "MZ50" "MZ53" "MZ54"
## [21] "MZ55" "MZ56"


Comments:

  • 28 observations (i.e., combination of participant and day) due to 22 participants are more extreme than others and show a CD above the cutoff


In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.


# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- PM[PM$ID!=ID,]
  m4 <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)

# SEX
p1 <- ggplot(parameters[parameters$pred=="SEXM",],aes(ID,est))+
  geom_point()+ggtitle("SEX effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="SEXM"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="SEXM" & parameters$est<=.1,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="SEXM",],aes(ID,t))+
  geom_point()+ggtitle("SEX effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="SEXM"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="SEXM" & parameters$est<=.1,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)

# MEQ.r
p1 <- ggplot(parameters[parameters$pred=="MEQ.r",],aes(ID,est))+
  geom_point()+ggtitle("MEQ-r effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="MEQ.r"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="MEQ.r" & (parameters$est<=(-.15)|parameters$est>=(-.13)),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="MEQ.r",],aes(ID,t))+
  geom_point()+ggtitle("MEQ-r effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="MEQ.r"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="MEQ.r" & (parameters$est<=(-.15)|parameters$est>=(-.13)),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)

# PSQI
p1 <- ggplot(parameters[parameters$pred=="PSQI",],aes(ID,est))+
  geom_point()+ggtitle("PSQI effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="PSQI"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="PSQI" & parameters$est<=(-.02),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="PSQI",],aes(ID,t))+
  geom_point()+ggtitle("PSQI effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="PSQI"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="PSQI" & parameters$est<=(-.02),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)

# BDI-II
p1 <- ggplot(parameters[parameters$pred=="BDI.II",],aes(ID,est))+
  geom_point()+ggtitle("BDI-II effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="BDI.II"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="BDI.II" & parameters$est<=(-.01),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="BDI.II",],aes(ID,t))+
  geom_point()+ggtitle("BDI-II effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="BDI.II"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="BDI.II" & parameters$est<=(-.01),],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)

# NAP
p1 <- ggplot(parameters[parameters$pred=="NAP",],aes(ID,est))+
  geom_point()+ggtitle("NAP effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="NAP"&parameters$ID=="All",],colour="red")
p2 <- ggplot(parameters[parameters$pred=="NAP",],aes(ID,t))+
  geom_point()+ggtitle("NAP effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="NAP"&parameters$ID=="All",],colour="red")
gridExtra::grid.arrange(p1,p2,nrow=2)

# ALCOL
p1 <- ggplot(parameters[parameters$pred=="ALCOL_week",],aes(ID,est))+
  geom_point()+ggtitle("ALCOHOL effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="ALCOL_week"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="ALCOL_week" & parameters$est<=.05,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="ALCOL_week",],aes(ID,t))+
  geom_point()+ggtitle("ALCOHOL effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="ALCOL_week"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="ALCOL_week" & parameters$est<=.05,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)

# SMOKE
p1 <- ggplot(parameters[parameters$pred=="FUMO..SI.NO.1",],aes(ID,est))+
  geom_point()+ggtitle("SMOKE effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="FUMO..SI.NO.1"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="FUMO..SI.NO.1" & parameters$est<=.35,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="FUMO..SI.NO.1",],aes(ID,t))+
  geom_point()+ggtitle("SMOKE effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="FUMO..SI.NO.1"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="FUMO..SI.NO.1" & parameters$est<=.35,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)

# COFFEE
p1 <- ggplot(parameters[parameters$pred=="CAFFE_week",],aes(ID,est))+
  geom_point()+ggtitle("COFFE effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="CAFFE_week"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="CAFFE_week" & parameters$est>=.009,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="CAFFE_week",],aes(ID,t))+
  geom_point()+ggtitle("COFFE effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="CAFFE_week"&parameters$ID=="All",],colour="red")+
  geom_text(data=parameters[parameters$pred=="CAFFE_week" & parameters$est>=.009,],
            aes(label=ID),nudge_x=-1) + theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)


Comments:

  • Two participants show stronger influence on several parameters: EM36 for week.time, sex, MEQ-r, PSQI, and smoke; MZ22 for EQr, alcohol and coffee. Alcohol is the mostly influenced parameter, whose magitude is decreased .072 (t = 1.89) to .047 (t = 1.22).

  • However, the exclusion of these participants does not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative).


Conclusions:

We decide to keep all participants in BT data analysis.


WT


# taking only ID and variables of interest (only continuous), specifying the model
data2check <- sleep.long[!is.na(sleep.long$WT.dayAfter)&!is.na(sleep.long$week.time)&!is.na(sleep.long$SEX)&
                           !is.na(sleep.long$MEQ.r)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                           !is.na(sleep.long$ALCOL_week)&!is.na(sleep.long$FUMO..SI.NO.)&
                           !is.na(sleep.long$CAFFE_week),
                         c("ID","day","WT.dayAfter","week.time","SEX","MEQ.r","PSQI","BDI.II","NAP",
                            "ALCOL_week","FUMO..SI.NO.","CAFFE_week")]
m <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),data2check)

# computing Cook's dinstance and cutoff
cd <- cooks.distance(m)
data2check$cd <- cd
cutoff <- 4/length(levels(data2check$ID)) # 4/N

# selecting influential cases
index <- c(1:nrow(data2check)) # nrow
pm=cbind(data2check,cd,index)
pm=pm[order(pm$cd,decreasing = TRUE),] # order
pm$infl <- NA
for(i in 1:nrow(pm)){
  if(pm[i,"cd"] >= cutoff){ pm[i,"infl"] <- 1 } else { pm[i,"infl"] <- 0 }
}
PM <- pm

# Plotting mahalanobis distance
par(mfrow=c(1,1),mar=c(6, 5, 4, 2) + 0.1)
plot(cd, xlab="Index",ylab="Cook's distance",
     main=paste("Cook's Distance \ncut-off: 4/N =",round(cutoff,3),
                " ( NInfl =",nrow(pm[pm$infl==1,]),")"),
     pch=20,cex.lab=1.6,cex.axis=1.4,col.main="blue")
points(pm[pm$cd>=cutoff,"index"],
       pm[pm$cd>=cutoff,"cd"],pch=20,cex=1.3,col="red")
text(pm[pm$cd>=cutoff,"index"]-10,
     pm[pm$cd>=cutoff,"cd"]-0.005,
     paste(pm[pm$cd>=cutoff,"ID"],pm[pm$cd>=cutoff,"day"]),cex=.8)
abline(h=cutoff,
       lwd=1.8,col="blue",lty=2)

# participants showing cases above the cutoff
levels(as.factor(as.character(pm[pm$cd>cutoff,"ID"])))
##  [1] "EM01" "EM07" "EM08" "EM09" "EM17" "EM22" "EM30" "EM36" "EM39" "EM47"
## [11] "em51" "MZ03" "MZ04" "MZ08" "MZ17" "MZ22" "MZ25" "MZ39" "MZ40" "MZ49"
## [21] "MZ54" "MZ56"


Comments:

  • 32 observations (i.e., combination of participant and day) due to 22 participants are more extreme than others and show a CD above the cutoff


In the next step, we re-estimate the parameters by excluding each potentially influential participant, one-by-one. We focus on substantial changes in the estimated parameters, such that the excluded cases implies a drastic change in the interpretation of the effects.


# parameters with all participants
parameters <- as.data.frame(round(summary(m)$coefficients,3))
parameters$ID <- "All"
IDs <- levels(as.factor(as.character(data2check[data2check$cd>cutoff,"ID"]))) # ID-day couples with CD > cutoff
for(ID in IDs){
  new.data <- PM[PM$ID!=ID,]
  m4 <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),new.data)
  new.data <- as.data.frame(round(summary(m4)$coefficients,3))
  new.data$ID <- ID
  parameters <- rbind(parameters,new.data)
}
parameters <- cbind(parameters,rep(c("Intercept","week.timewe","SEXM","MEQ.r","PSQI","BDI.II","NAP",
                                     "ALCOL_week","FUMO..SI.NO.1","CAFFE_week"),
                                   length(IDs)+1))
colnames(parameters) <- c("est","st.err","t","ID","pred")
parameters
# week.time
p1 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,est))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  theme(axis.text.x = element_blank())
p2 <- ggplot(parameters[parameters$pred=="week.timewe",],aes(ID,t))+
  geom_point()+ggtitle("week.time effect - original (red) vs. removing cases")+xlab("")+
  geom_point(data=parameters[parameters$pred=="week.timewe"&parameters$ID=="All",],colour="red")+
  theme(axis.text.x = element_blank())
gridExtra::grid.arrange(p1,p2,nrow=2)


Comments:

  • None of the estimated parameters seems to be substantially affected by potentially influential cases.

  • MZ22 is the participant influencing more parameters, and particoularly MEQ-r, alcohol, smoke, and coffee comsumption. However, the exclusion of these participants does not change the interpretation of the results (i.e., non-significant effects remain non-significant, positive effects remain positive and negative effects remain negative). The strongest influence of this participant is on alcohol assumption, whose estimated parameter changes from .05 (t = 1.82) to .03 (t = 1.08) after the removal of MZ22.


infl <- influence.ME::influence(m,"ID")
cd <- influence.ME::cooks.distance.estex(infl, parameter=8, sort=TRUE)[,1]
cd[which(names(cd)=="MZ22")]
##      MZ22 
## 0.5399679
4/nlevels(data2check$ID) # cutoff
## [1] 0.04878049


Conclusions:

We decide to keep all participants in WT data analysis.


7. LMER Models


In this section, we use the information from previous sections to specify a set of LMER models for each sleep metric of interest. As mentioned above, we proceed with a hierarchical regression by including the predictors of interest one-by-one, and we perform a model comparison starting from a null model (with no predictors) until the model including all predictors. Each model is compared to the previous one using the likelihood ratio test (LRT) and the Aikake Information Criterion (AIC).


library(lme4); library(effects); library(MuMIn); library(arm) # modeling
library(gridExtra); library(jtools); library(knitr); library(sjPlot) # outputs
options(knitr.kable.NA = '')


PIPELINE


For each sleep measure:

  • We specify the models including the predictors in the order mentioned above.

  • The set of models is precessed using the LRT and the AIC. The AIC weight (AICw) is used to express AIC in a probabilistic way (i.e., each model is associated with a probability, from 0 to 1, to be the most plausible model), and the evidence ratio (dAIC) is used to assess how much each model is less plausible than the selected model.

  • Finally, the outcomes of the selected model are assessed.


TST


TST is modeled assuming a normal distribution for residuals and including all participants (see previous sections).


# excluding missing observations (3 rows)
clean <- sleep.long[!is.na(sleep.long$TST)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                      !is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]

# null model
null <- lmer(TST~(1|ID),
             data=clean,REML=FALSE)

# predictors
week.time <- lmer(TST~week.time+(1|ID),
             data=clean,REML=FALSE)
SEX <- lmer(TST~week.time+SEX+(1|ID),
            data=clean,REML=FALSE)
MEQ.r <- lmer(TST~week.time+SEX+MEQ.r+(1|ID),
              data=clean,REML=FALSE)
PSQI <- lmer(TST~week.time+SEX+MEQ.r+PSQI+(1|ID),
             data=clean,REML=FALSE)
BDI <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
            data=clean,REML=FALSE)
NAP <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
            data=clean,REML=FALSE)
sub <- lmer(TST~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
            data=clean,REML=FALSE)


All models converge with no problems. We can proceed with the model comparison.


LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A


Comments:

  • The model including the effect of week.time and that including also sex are the only models associated to a significantly higher logLikelihood compared to the null model for TST (Chisq(1) = 10.05, p = .001).

  • The model including only sex and week.time is also the model with the lowest AIC (AICw = -.57), followed by the model including MEQ.r (AICw = .21). The dAIC suggests that the former is about 2.5 times more plausible than the latter.


Now, let’s proceed by evaluating the output of the selected model.


summary(SEX)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: TST ~ week.time + SEX + (1 | ID)
##    Data: clean
## 
##      AIC      BIC   logLik deviance df.resid 
##   1852.3   1874.0   -921.1   1842.3      566 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3789 -0.5449  0.0632  0.5636  3.8587 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 0.2665   0.5162  
##  Residual             1.2983   1.1394  
## Number of obs: 571, groups:  ID, 82
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   6.7188     0.1057  63.549
## week.timewe   0.2305     0.1054   2.187
## SEXM         -0.4874     0.1491  -3.269
## 
## Correlation of Fixed Effects:
##             (Intr) wk.tmw
## week.timewe -0.285       
## SEXM        -0.651 -0.003


Comments:

  • TST is only predicted by participants’ sex (with males showing shorter sleep duration), and week time (with weekend being associated to longer sleep duration than weekdays).


Below, the effects are plotted and a summary table is generated.


# effect plot
plot(allEffects(SEX))

# output table
TST <- cbind(Model=c("Intercept","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP","substances"),
             round(LRT[,c("Chisq","Pr(>Chisq)")],2),
             A[order(A$df),],
             rbind(round(coef(summary(SEX)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
rownames(TST) <- rownames(LRT)
kable(TST,format="pandoc",digits=2,align="l",caption="LMER models comparison and outputs")
LMER models comparison and outputs
Model Chisq Pr(>Chisq) df AIC AICw dAIC Estimate Std. Error t value
Intercept 3 1863.03 0.00 219.20 6.72 0.11 63.55
Week time 4.73 0.03 4 1860.30 0.01 55.98 0.23 0.11 2.19
Sex (males) 10.05 0.00 5 1852.25 0.57 1.00 -0.49 0.15 -3.27
MEQ-r 0.04 0.84 6 1854.22 0.21 2.68
PSQI 0.17 0.68 7 1856.04 0.09 6.65
BDI 1.69 0.19 8 1856.35 0.07 7.77
NAP 0.09 0.76 9 1858.26 0.03 20.19
substances 5.40 0.14 12 1858.86 0.02 27.25


Finally, we compute 95% Bootstrap confidence intervals around the Intercept of the null model, to have a general idea of the sample average TST. We do not use the intercept of the selected model since it is related to specific levels of the included predictors (i.e., females during weekdays).


# bootsrap confidence intervals
ci <- lme4::confint.merMod(null,method="boot",nsim=10000,level=0.95)
ci <- as.data.frame(ci[3:nrow(ci),])
ci # intercept CI (minutes)
ci/60 # intercept CI (hours)


Conclusions:


Total sleep duration was predicted only by participants’ sex, with males showing a shorter duration than females. On average, participants sleep between 6:29 and 6:43 hours.


SE


SE is modeled assuming a normal distribution for residuals and excluding participant EM07 and MZ53 (see previous sections).


# excluding missing observations and influential case (10 rows)
clean <- sleep.long[!is.na(sleep.long$SE)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                      !is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r)&
                      sleep.long$ID!="EM07" & sleep.long$ID!="MZ53",]

# null model
null <- lmer(SE~(1|ID),
             data=clean,REML=FALSE)

# predictors
week.time <- lmer(SE~week.time+(1|ID),
             data=clean,REML=FALSE)
SEX <- lmer(SE~week.time+SEX+(1|ID),
            data=clean,REML=FALSE)
MEQ.r <- lmer(SE~week.time+SEX+MEQ.r+(1|ID),
              data=clean,REML=FALSE)
PSQI <- lmer(SE~week.time+SEX+MEQ.r+PSQI+(1|ID),
             data=clean,REML=FALSE)
BDI <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
            data=clean,REML=FALSE)
NAP <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
            data=clean,REML=FALSE)
sub <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
            data=clean,REML=FALSE)


All models converge with no problems. We can proceed with the model comparison.


LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A


Comments:

  • None of the specified models shows a significantly higher logLikelihood than the null model.

  • Consistently, the null model shows the lowest AIC (AICw = .37), with model 1 (including only week.time) showing the second lowest AIC (AICw = .25, dAIC = 1.50). The dAIC suggests that the null model is about 2.5 times more plausible than model 1.


# output table
SE <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP","substances"),
             round(LRT[,c("Chisq","Pr(>Chisq)")],2),
             A[order(A$df),],
             rbind(round(coef(summary(null)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
kable(SE,format="pandoc",digits=2,align="l",caption="LMER models comparison and outputs")
LMER models comparison and outputs
Model Chisq Pr(>Chisq) df AIC AICw dAIC Estimate Std. Error t value
Null 3 3051.75 0.37 1.00 86.84 0.43 201.66
Week time 1.19 0.28 4 3052.56 0.25 1.50
Sex (males) 1.57 0.21 5 3052.99 0.20 1.86
MEQ-r 0.02 0.89 6 3054.97 0.07 5.00
PSQI 0.98 0.32 7 3055.98 0.04 8.29
BDI 0.83 0.36 8 3057.16 0.02 14.95
NAP 1.97 0.16 9 3057.19 0.02 15.18
substances 4.55 0.21 12 3058.63 0.01 31.19


Let’s see what the results would have been if EM07 and MZ53 were not excluded.


# excluding missing observations and influential case (10 rows)
clean <- sleep.long[!is.na(sleep.long$SE)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                      !is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]

# null model
null <- lmer(SE~(1|ID),
             data=clean,REML=FALSE)

# predictors
week.time <- lmer(SE~week.time+(1|ID),
             data=clean,REML=FALSE)
SEX <- lmer(SE~week.time+SEX+(1|ID),
            data=clean,REML=FALSE)
MEQ.r <- lmer(SE~week.time+SEX+MEQ.r+(1|ID),
              data=clean,REML=FALSE)
PSQI <- lmer(SE~week.time+SEX+MEQ.r+PSQI+(1|ID),
             data=clean,REML=FALSE)
BDI <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
            data=clean,REML=FALSE)
NAP <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
            data=clean,REML=FALSE)
sub <- lmer(SE~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
            data=clean,REML=FALSE)

LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
summary(sub)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## SE ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +  
##     FUMO..SI.NO. + CAFFE_week + (1 | ID)
##    Data: clean
## 
##      AIC      BIC   logLik deviance df.resid 
##   3254.4   3306.5  -1615.2   3230.4      559 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -7.9886 -0.4386  0.0794  0.5408  3.0929 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 17.95    4.236   
##  Residual             11.79    3.434   
## Number of obs: 571, groups:  ID, 82
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)   91.039003   2.490126  36.560
## week.timewe    0.166186   0.318242   0.522
## SEXM          -1.786910   1.111971  -1.607
## MEQ.r         -0.129717   0.143768  -0.902
## PSQI          -0.003726   0.219586  -0.017
## BDI.II        -0.044591   0.078835  -0.566
## NAP1          -0.908227   0.453040  -2.005
## ALCOL_week     0.122889   0.168164   0.731
## FUMO..SI.NO.1 -2.611636   1.182318  -2.209
## CAFFE_week    -0.117700   0.066067  -1.782
## 
## Correlation of Fixed Effects:
##             (Intr) wk.tmw SEXM   MEQ.r  PSQI   BDI.II NAP1   ALCOL_ FUMO..
## week.timewe -0.038                                                        
## SEXM        -0.212 -0.001                                                 
## MEQ.r       -0.809  0.000  0.028                                          
## PSQI        -0.403 -0.001  0.151  0.040                                   
## BDI.II      -0.124  0.001 -0.149  0.140 -0.547                            
## NAP1        -0.020  0.056  0.020 -0.001  0.001 -0.017                     
## ALCOL_week  -0.031  0.000 -0.445  0.026 -0.130  0.185 -0.010              
## FUMO..SI.NO -0.168 -0.001  0.044  0.163  0.062 -0.095 -0.019 -0.291       
## CAFFE_week  -0.042 -0.001  0.197 -0.212  0.028 -0.054 -0.017 -0.332 -0.134


Comments:

  • Including EM07 and MZ53 would have led to an effect of NAP and Smoke, with lower SE on days with NAP compared to days without NAP (coef. = -.91, SE = .45, t = -2.00) and on smokers compared to non-smokers (coef. = -2.61, SE = 1.18, t = -2.21).


Conclusions:


None of the included predictors exerted a significant effect on SE. The effect of NAP was due to participant EM07, and the effect of Smoke was due to participant MZ53.


SOL


SOL is log-transformed and modeled assuming a normal distribution for residuals. All participants are included in the analysis (see previous sections).


# excluding missing observations (3 rows)
clean <- sleep.long[!is.na(sleep.long$SOL)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                      !is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]

# null model
null <- lmer(log(SOL+.05)~(1|ID),
             data=clean,REML=FALSE)

# predictors
week.time <- lmer(log(SOL+.05)~week.time+(1|ID),
             data=clean,REML=FALSE)
SEX <- lmer(log(SOL+.05)~week.time+SEX+(1|ID),
            data=clean,REML=FALSE)
MEQ.r <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+(1|ID),
              data=clean,REML=FALSE)
PSQI <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+(1|ID),
             data=clean,REML=FALSE)
BDI <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
            data=clean,REML=FALSE)
NAP <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
            data=clean,REML=FALSE)
sub <- lmer(log(SOL+.05)~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
            data=clean,REML=FALSE)


All models converge with no problems. We can proceed with the model comparison.


LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A


Comments:

  • The models including the effect of BDI-II and substances are the only models associated with a significantly higher logLikelihood compared to the null model for SE.

  • Coherently, the last model shows the lowest AIC (AICw = .44), with the model with BDI-II showing the second lowest AIC (AICw = .17). The dAIC suggests that the former is about 2.64 times more plausible than the latter.


Now, let’s proceed by evaluating the output of the selected model.


summary(sub)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## log(SOL + 0.05) ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP +  
##     ALCOL_week + FUMO..SI.NO. + CAFFE_week + (1 | ID)
##    Data: clean
## 
##      AIC      BIC   logLik deviance df.resid 
##   2216.6   2268.7  -1096.3   2192.6      559 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0690 -0.3713  0.1379  0.6418  2.1937 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 0.5539   0.7442  
##  Residual             2.3710   1.5398  
## Number of obs: 571, groups:  ID, 82
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)    1.39892    0.53359   2.622
## week.timewe   -0.04886    0.14265  -0.342
## SEXM           0.60406    0.23763   2.542
## MEQ.r         -0.04752    0.03069  -1.548
## PSQI          -0.03439    0.04686  -0.734
## BDI.II         0.03629    0.01684   2.155
## NAP1           0.23762    0.19455   1.221
## ALCOL_week    -0.08853    0.03588  -2.467
## FUMO..SI.NO.1 -0.13386    0.25244  -0.530
## CAFFE_week     0.03042    0.01409   2.158
## 
## Correlation of Fixed Effects:
##             (Intr) wk.tmw SEXM   MEQ.r  PSQI   BDI.II NAP1   ALCOL_ FUMO..
## week.timewe -0.079                                                        
## SEXM        -0.210 -0.001                                                 
## MEQ.r       -0.807  0.000  0.027                                          
## PSQI        -0.403 -0.001  0.152  0.041                                   
## BDI.II      -0.124  0.001 -0.152  0.141 -0.546                            
## NAP1        -0.040  0.054  0.041 -0.001  0.003 -0.034                     
## ALCOL_week  -0.031  0.001 -0.446  0.026 -0.130  0.187 -0.021              
## FUMO..SI.NO -0.169 -0.002  0.043  0.164  0.063 -0.094 -0.038 -0.290       
## CAFFE_week  -0.041 -0.002  0.196 -0.212  0.028 -0.053 -0.034 -0.332 -0.133


Comments:

  • SOL is mainly predicted by participants’ sex (with longer SOL for males), BDI-II score, and the weekly amount of consumed alcohol and coffee.


Below, the effects are plotted and a summary table is generated.


# effect plots
plot_model(sub, type = "eff",dot.size=7,line.size=3)$'SEX'+geom_line(lwd=2)+
  labs(x="Gender",y="log(SOL + .05)")+ggtitle("")+
  # geom_point(data=sleep.long,aes(BDI.II,SOL),alpha=.3)+ylim(0,3)+xlim(0,35)+
  theme(text=element_text(size=20,face="bold",family="time"))

plot_model(sub, type = "eff")$'BDI.II'+geom_line(lwd=2)+
  labs(x="BDI-II scores",y="log(SOL + .05)")+ggtitle("")+
  geom_point(data=sleep.long,aes(BDI.II,SOL),alpha=.3)+ylim(0,3)+xlim(0,35)+
  theme(text=element_text(size=20,face="bold",family="time"))

plot_model(sub, type = "eff")$'ALCOL_week'+geom_line(lwd=2)+
  labs(x="Weekly alcohol consumption (alcoholic units)",y="log(SOL + .05)")+ggtitle("")+
  geom_point(data=sleep.long,aes(ALCOL_week,SOL),alpha=.3)+ylim(-.5,2)+xlim(0,13)+
  theme(text=element_text(size=20,face="bold",family="time"))

plot_model(sub, type = "eff")$'CAFFE_week'+geom_line(lwd=2)+
  labs(x="Weekly coffee consumption (n. of cups)",y="log(SOL + .05)")+ggtitle("")+
  geom_point(data=sleep.long,aes(CAFFE_week,SOL),alpha=.3)+ylim(-.5,3)+xlim(0,35)+
  theme(text=element_text(size=20,face="bold",family="time"))

# output table
SOL <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP",
                     "substances (alcohol)","substances (smoke)","substances (coffee)"),
             rbind(round(LRT[,c("Chisq","Pr(>Chisq)")],2),NA,NA),
             rbind(A[order(A$df),],NA,NA),
             round(coef(summary(sub)),2))
kable(SOL,format="pandoc",digits=2,align="l",caption="LMER models comparison and outputs")
LMER models comparison and outputs
Model Chisq Pr(>Chisq) df AIC AICw dAIC Estimate Std. Error t value
Null 3 2219.37 0.11 4.08 1.40 0.53 2.62
Week time 0.17 0.68 4 2221.20 0.04 10.18 -0.05 0.14 -0.34
Sex (males) 2.39 0.12 5 2220.81 0.05 8.37 0.60 0.24 2.54
MEQ-r 2.06 0.15 6 2220.75 0.05 8.13 -0.05 0.03 -1.55
PSQI 0.19 0.66 7 2222.57 0.02 20.19 -0.03 0.05 -0.73
BDI 6.07 0.01 8 2218.50 0.17 2.64 0.04 0.02 2.16
NAP 1.35 0.24 9 2219.14 0.12 3.63 0.24 0.19 1.22
substances (alcohol) 8.59 0.04 12 2216.56 0.44 1.00 -0.09 0.04 -2.47
substances (smoke) -0.13 0.25 -0.53
substances (coffee) 0.03 0.01 2.16


Conclusions:

The model comparison suggested a positive relationship between SOL and depressive symptoms, a negative relationship between SOL and the weekly amount of alcoholic units, a positive relationship between SOL and the weekly amount of coffees, and a higher SOL among smokers compared to non-smokers. Moreover, males showed shorted SOL than females.


WASO


WASO is modeled assuming a normal distribution for residuals and by excluding participant MZ53 (see previous sections).


# excluding missing observations (10 rows)
clean <- sleep.long[!is.na(sleep.long$TST)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                      !is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r)&
                      sleep.long!="MZ53",]

# null model
null <- lmer(WASO~(1|ID),
             data=clean,REML=FALSE)

# predictors
week.time <- lmer(WASO~week.time+(1|ID),
             data=clean,REML=FALSE)
SEX <- lmer(WASO~week.time+SEX+(1|ID),
            data=clean,REML=FALSE)
MEQ.r <- lmer(WASO~week.time+SEX+MEQ.r+(1|ID),
              data=clean,REML=FALSE)
PSQI <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+(1|ID),
             data=clean,REML=FALSE)
BDI <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
            data=clean,REML=FALSE)
NAP <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
            data=clean,REML=FALSE)
sub <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
            data=clean,REML=FALSE)


All models converge with no problems. We can proceed with the model comparison.


LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A


Comments:

  • None of the specified models shows a significantly higher logLikelihood than the null model.

  • Consistently, the null model shows the lowest AIC (AICw = .40), with model 1 (including only week.time) showing the second lowest AIC (AICw = .35, dAIC = 1.14). The dAIC suggests that the null model is only slightly more plausible than model 1.


# output table
WASO <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP","substances"),
             round(LRT[,c("Chisq","Pr(>Chisq)")],2),
             A[order(A$df),],
             rbind(round(coef(summary(null)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
kable(WASO,format="pandoc",digits=2,align="l",caption="Table X. LMER models comparison and outputs")
Table X. LMER models comparison and outputs
Model Chisq Pr(>Chisq) df AIC AICw dAIC Estimate Std. Error t value
Null 3 5050.04 0.46 1.00 51.71 2.02 25.62
Week time 1.24 0.27 4 5050.80 0.32 1.46
Sex (males) 0.01 0.93 5 5052.79 0.12 3.96
MEQ-r 0.77 0.38 6 5054.02 0.06 7.32
PSQI 0.05 0.82 7 5055.97 0.02 19.39
BDI 0.56 0.46 8 5057.42 0.01 40.04
NAP 0.00 0.99 9 5059.41 0.00 108.31
substances 4.07 0.25 12 5061.34 0.00 284.29


Let’s see what would have been the results if MZ53 was not excluded.


# excluding missing observations and influential case (10 rows)
clean <- sleep.long[!is.na(sleep.long$WASO)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                      !is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]

# null model
null <- lmer(WASO~(1|ID),
             data=clean,REML=FALSE)

# predictors
week.time <- lmer(WASO~week.time+(1|ID),
             data=clean,REML=FALSE)
SEX <- lmer(WASO~week.time+SEX+(1|ID),
            data=clean,REML=FALSE)
MEQ.r <- lmer(WASO~week.time+SEX+MEQ.r+(1|ID),
              data=clean,REML=FALSE)
PSQI <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+(1|ID),
             data=clean,REML=FALSE)
BDI <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
            data=clean,REML=FALSE)
NAP <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
            data=clean,REML=FALSE)
sub <- lmer(WASO~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
            data=clean,REML=FALSE)

LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A
summary(sub)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## WASO ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +  
##     FUMO..SI.NO. + CAFFE_week + (1 | ID)
##    Data: clean
## 
##      AIC      BIC   logLik deviance df.resid 
##   5163.7   5215.8  -2569.8   5139.7      559 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4659 -0.5801 -0.0421  0.5483  4.3164 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 337.2    18.36   
##  Residual             354.9    18.84   
## Number of obs: 571, groups:  ID, 82
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)    36.9767    11.0773   3.338
## week.timewe     1.1276     1.7456   0.646
## SEXM            4.9654     4.9445   1.004
## MEQ.r           0.8854     0.6392   1.385
## PSQI            0.1044     0.9762   0.107
## BDI.II         -0.3125     0.3505  -0.892
## NAP1           -0.4684     2.4681  -0.190
## ALCOL_week     -0.7457     0.7476  -0.997
## FUMO..SI.NO.1  11.4342     5.2566   2.175
## CAFFE_week      0.3529     0.2937   1.201
## 
## Correlation of Fixed Effects:
##             (Intr) wk.tmw SEXM   MEQ.r  PSQI   BDI.II NAP1   ALCOL_ FUMO..
## week.timewe -0.047                                                        
## SEXM        -0.212 -0.001                                                 
## MEQ.r       -0.808  0.000  0.028                                          
## PSQI        -0.403 -0.001  0.151  0.040                                   
## BDI.II      -0.124  0.001 -0.149  0.140 -0.547                            
## NAP1        -0.024  0.056  0.025 -0.001  0.001 -0.020                     
## ALCOL_week  -0.031  0.000 -0.445  0.026 -0.130  0.185 -0.012              
## FUMO..SI.NO -0.168 -0.002  0.044  0.163  0.062 -0.095 -0.023 -0.291       
## CAFFE_week  -0.042 -0.002  0.197 -0.212  0.028 -0.054 -0.021 -0.332 -0.134


Comments:

  • including MZ53 would have led to a slightly higher effect of Smoke, with higher WASO on smokers compared to non-smokers (coef. = 11.41, SE = 5.26, t = 2.17)


Conclusions:

None of the included predictors exerted a significant effect on WASO. The effect of Smoke was not substantial, and mainly due to participant MZ53.


BT


BT is modeled assuming a normal distribution for residuals and including all participants (see previous sections).


# excluding missing observations (3 rows)
clean <- sleep.long[!is.na(sleep.long$BT)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                      !is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]

# null model
null <- lmer(BT~(1|ID),
             data=clean,REML=FALSE)

# predictors
week.time <- lmer(BT~week.time+(1|ID),
             data=clean,REML=FALSE)
SEX <- lmer(BT~week.time+SEX+(1|ID),
            data=clean,REML=FALSE)
MEQ.r <- lmer(BT~week.time+SEX+MEQ.r+(1|ID),
              data=clean,REML=FALSE)
PSQI <- lmer(BT~week.time+SEX+MEQ.r+PSQI+(1|ID),
             data=clean,REML=FALSE)
BDI <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
            data=clean,REML=FALSE)
NAP <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
            data=clean,REML=FALSE)
sub <- lmer(BT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
            data=clean,REML=FALSE)


All models converge with no problems. We can proceed with the model comparison.


LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A


Comments:

  • The model including the effect of week.time shows a significantly higher logLikelihood compared to the null model for BT. Moreover, the LRT suggests a significant contribution of participants’ sex over week time, a significant contribution of MEQ-r score over week time and sex, and a significant contribution of substances’ effect over the other predictors.

  • Coherently, the last model (with the effect of substances) shows the lowest AIC (AICw = .55), with the model with MEQ-r showing the second lowest AIC (AICw = .30). The dAIC suggests that the former is about 2 times more plausible than the latter.


Now, let’s proceed by evaluating the output of the selected model.


summary(sub)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## BT ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +  
##     FUMO..SI.NO. + CAFFE_week + (1 | ID)
##    Data: clean
## 
##      AIC      BIC   logLik deviance df.resid 
##   2063.6   2115.8  -1019.8   2039.6      559 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3411 -0.5607 -0.0853  0.4495  7.4691 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 0.6541   0.8087  
##  Residual             1.7316   1.3159  
## Number of obs: 571, groups:  ID, 82
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)   26.359300   0.535306  49.242
## week.timewe    0.688773   0.121918   5.649
## SEXM           0.182231   0.238626   0.764
## MEQ.r         -0.143328   0.030829  -4.649
## PSQI           0.002443   0.047078   0.052
## BDI.II         0.001894   0.016912   0.112
## NAP1           0.012768   0.168771   0.076
## ALCOL_week     0.071953   0.036051   1.996
## FUMO..SI.NO.1  0.420832   0.253573   1.660
## CAFFE_week     0.002933   0.014161   0.207
## 
## Correlation of Fixed Effects:
##             (Intr) wk.tmw SEXM   MEQ.r  PSQI   BDI.II NAP1   ALCOL_ FUMO..
## week.timewe -0.067                                                        
## SEXM        -0.211 -0.001                                                 
## MEQ.r       -0.807  0.000  0.027                                          
## PSQI        -0.403 -0.001  0.151  0.040                                   
## BDI.II      -0.124  0.001 -0.151  0.140 -0.546                            
## NAP1        -0.034  0.055  0.035 -0.001  0.002 -0.029                     
## ALCOL_week  -0.031  0.001 -0.445  0.026 -0.130  0.186 -0.018              
## FUMO..SI.NO -0.168 -0.002  0.043  0.164  0.063 -0.094 -0.033 -0.290       
## CAFFE_week  -0.041 -0.002  0.196 -0.212  0.028 -0.054 -0.029 -0.332 -0.133


Comments:

  • BT is predicted by the week time, with weekend days showing later BT compared to week days. Moreover, BT was predicted by the MEQ-r score, with earlier BT being predicted by higher scores (i.e., indicating morning circadian preferences). Finally, weekly alcohol consumption shows a weaker positive effect on BT.

  • the effect of sex (which is positive, suggesting later BT for males) is not substantial when considering all other predictors.


Below, the effects are plotted and a summary table is generated.


# effect plots
plot_model(sub, type = "eff",dot.size=7,line.size=3)$'SEX'+
  labs(x="Gender",y="Bed time")+ggtitle("")+
  scale_x_continuous(breaks=c(1,2),labels=c("Females","Males"),limits=c(.75,2.25))+
  scale_y_continuous(labels=c("00:48","01:00","01:12","01:24","01:36"))+
  geom_segment(x=1,xend=2,y=25.11,yend=25.29,lwd=1.5,lty=2)+
  theme(text=element_text(size=20,face="bold",family="time"))

plot_model(sub, type = "eff",dot.size=7,line.size=3)$'week.time'+
  labs(x="Week time",y="Bed time")+ggtitle("")+
  scale_x_continuous(breaks=c(1,2),labels=c("Weekend","Weekdays"),limits=c(.75,2.25))+
  theme(text=element_text(size=20,face="bold",family="time"))

plot_model(sub, type = "eff")$'MEQ.r'+geom_line(lwd=2)+
  labs(x="MEQ-r scores",y="Bedtime (hours from 00:00)")+ggtitle("")+
  geom_point(data=sleep.long,aes(MEQ.r,BT),alpha=.3)+ylim(23.5,27)+xlim(5,20)+
  theme(text=element_text(size=20,face="bold",family="time"))

# output table
BT <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP",
                     "substances (alcohol)","substances (smoke)","substances (coffee)"),
             rbind(round(LRT[,c("Chisq","Pr(>Chisq)")],2),NA,NA),
             rbind(A[order(A$df),],NA,NA),
             rbind(round(coef(summary(MEQ.r)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
kable(BT,format="pandoc",digits=2,align="l",caption="Table X. LMER models comparison and outputs")
Table X. LMER models comparison and outputs
Model Chisq Pr(>Chisq) df AIC AICw dAIC Estimate Std. Error t value
Null 3 2111.82 0.00 4.196073e+10 26.78 0.45 59.54
Week time 30.94 0.00 4 2082.87 0.00 2.169853e+04 0.69 0.12 5.65
Sex (males) 3.84 0.05 5 2081.03 0.00 8.647280e+03 0.43 0.23 1.88
MEQ-r 20.14 0.00 6 2062.90 0.44 1.000000e+00 -0.15 0.03 -4.78
PSQI 0.04 0.83 7 2064.85 0.17 2.650000e+00
BDI 0.02 0.89 8 2066.83 0.06 7.130000e+00
NAP 0.06 0.81 9 2068.77 0.02 1.882000e+01
substances (alcohol) 11.14 0.01 12 2063.63 0.31 1.440000e+00
substances (smoke)
substances (coffee)


Conclusions:

The model’s comparison suggested a negative relationship between BT and the score on the MEQ-r, a positive relationship between BT and the weekly amount of consumed alcoholic units, and later BT during the weekend compared to weekdays.


WT


WT is modeled assuming a normal distribution for residuals and including all participants (see previous sections). Note that for testing the effects of week.time on this variable, we should use WT.dayAfter, which is referred to the subsequent day (e.g., WT on day 7 is Monday WT). In this way, Monday to Friday are considered weekdays, whereas Saturday and Sunday are considered weekend days.


# excluding missing observations (3 rows)
clean <- sleep.long[!is.na(sleep.long$WT)&!is.na(sleep.long$PSQI)&!is.na(sleep.long$BDI.II)&
                      !is.na(sleep.long$STAI.Y2)&!is.na(sleep.long$MEQ.r),]
clean$WT <- clean$WT.dayAfter # using the WT referred to the subsequent day

# null model
null <- lmer(WT.dayAfter~(1|ID),
             data=clean,REML=FALSE)

# predictors
week.time <- lmer(WT.dayAfter~week.time+(1|ID),
             data=clean,REML=FALSE)
SEX <- lmer(WT.dayAfter~week.time+SEX+(1|ID),
            data=clean,REML=FALSE)
MEQ.r <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+(1|ID),
              data=clean,REML=FALSE)
PSQI <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+(1|ID),
             data=clean,REML=FALSE)
BDI <- lmer(WT.dayAfter~week.time+SEX+MEQ.r+PSQI+BDI.II+(1|ID),
            data=clean,REML=FALSE)
NAP <- lmer(WT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+(1|ID),
            data=clean,REML=FALSE)
sub <- lmer(WT~week.time+SEX+MEQ.r+PSQI+BDI.II+NAP+ALCOL_week+FUMO..SI.NO.+CAFFE_week+(1|ID),
            data=clean,REML=FALSE)


All models converge with no problems. We can proceed with the model comparison.


LRT <- lmtest::lrtest(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub)
LRT
A <- round(AIC(null,week.time,SEX,MEQ.r,PSQI,BDI,NAP,sub),2)
A <- A[order(A$AIC),]
A$AICw <- round(Weights(A$AIC),2)
A$dAIC <- round(exp((A$AIC - A[1,2])/2),2)
A


Comments:

  • The model including the effect of week.time shows a significantly higher logLikelihood compared to the null model for BT. Moreover, the LRT suggests a significant contribution of the MEQ-r score over week time and sex, and a significant contribution of substances’ effect over the other predictors.

  • Coherently, the last model (with the effect of substances) shows the lowest AIC (AICw = .59), with the model with MEQ-r showing the second lowest AIC (AICw = .26). The dAIC suggests that the former is about 2 times more plausible than the latter.


Now, let’s proceed by evaluating the output of the selected model.


summary(sub)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## WT ~ week.time + SEX + MEQ.r + PSQI + BDI.II + NAP + ALCOL_week +  
##     FUMO..SI.NO. + CAFFE_week + (1 | ID)
##    Data: clean
## 
##      AIC      BIC   logLik deviance df.resid 
##   1877.0   1929.0   -926.5   1853.0      553 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9786 -0.6070 -0.0609  0.5764  4.0169 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept) 0.3716   0.6096  
##  Residual             1.3315   1.1539  
## Number of obs: 565, groups:  ID, 82
## 
## Fixed effects:
##                Estimate Std. Error t value
## (Intercept)   10.217482   0.424172  24.088
## week.timewe    0.892776   0.107398   8.313
## SEXM          -0.173819   0.189146  -0.919
## MEQ.r         -0.141341   0.024402  -5.792
## PSQI           0.007015   0.037254   0.188
## BDI.II        -0.008330   0.013381  -0.623
## NAP1          -0.017482   0.147722  -0.118
## ALCOL_week     0.054181   0.028495   1.901
## FUMO..SI.NO.1  0.380333   0.201060   1.892
## CAFFE_week    -0.001215   0.011184  -0.109
## 
## Correlation of Fixed Effects:
##             (Intr) wk.tmw SEXM   MEQ.r  PSQI   BDI.II NAP1   ALCOL_ FUMO..
## week.timewe -0.073                                                        
## SEXM        -0.208 -0.004                                                 
## MEQ.r       -0.807 -0.003  0.027                                          
## PSQI        -0.404 -0.003  0.149  0.042                                   
## BDI.II      -0.124  0.005 -0.155  0.139 -0.544                            
## NAP1        -0.035  0.054  0.036 -0.006  0.001 -0.028                     
## ALCOL_week  -0.032  0.003 -0.447  0.026 -0.129  0.189 -0.018              
## FUMO..SI.NO -0.172 -0.005  0.039  0.167  0.068 -0.092 -0.040 -0.286       
## CAFFE_week  -0.040 -0.003  0.197 -0.211  0.027 -0.055 -0.033 -0.332 -0.133


Comments:

  • BT is predicted by the week time, with weekend days showing slightly later BT compared to week days. The score on the MEQ-r is the strongest predictor, with earlier WT being predicted by higher scores. Finally, weekly alcohol consumption shows a positive but very small effect on BT and an even weaker effect of smoke.


Below, the effects are plotted and a summary table is generated.


# effect plots
plot_model(sub, type = "eff",dot.size=7,line.size=3)$'week.time'+
  labs(x="Week time",y="Weak time (hours from 00:00)")+ggtitle("")+
  scale_x_continuous(breaks=c(1,2),labels=c("Weekend","Weekdays"),limits=c(.75,2.25))+
  theme(text=element_text(size=20,face="bold",family="time"))

plot_model(sub, type = "eff")$'MEQ.r'+geom_line(lwd=2)+
  labs(x="MEQ-r scores",y="Weak time (hours from 00:00)")+ggtitle("")+
  geom_point(data=sleep.long,aes(MEQ.r,WT),alpha=.3)+ylim(6,10.5)+xlim(5,20)+
  theme(text=element_text(size=20,face="bold",family="time"))

# output table
WT <- cbind(Model=c("Null","Week time","Sex (males)","MEQ-r","PSQI","BDI","NAP",
                     "substances (alcohol)","substances (smoke)","substances (coffee)"),
             rbind(round(LRT[,c("Chisq","Pr(>Chisq)")],2),NA,NA),
             rbind(A[order(A$df),],NA,NA),
             rbind(round(coef(summary(MEQ.r)),2),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3),rep(NA,3)))
kable(WT,format="pandoc",digits=2,align="l",caption="Table X. LMER models comparison and outputs")
Table X. LMER models comparison and outputs
Model Chisq Pr(>Chisq) df AIC AICw dAIC Estimate Std. Error t value
Null 3 1963.99 0.00 9.473215e+18 10.43 0.36 29.24
Week time 64.64 0.00 4 1901.35 0.00 2.368068e+05 0.89 0.11 8.33
Sex (males) 0.18 0.67 5 1903.17 0.00 5.883045e+05 0.01 0.18 0.07
MEQ-r 28.57 0.00 6 1876.60 0.41 1.000000e+00 -0.14 0.02 -5.85
PSQI 0.01 0.91 7 1878.59 0.15 2.700000e+00
BDI 0.69 0.41 8 1879.90 0.08 5.210000e+00
NAP 0.02 0.90 9 1881.89 0.03 1.408000e+01
substances (alcohol) 10.90 0.01 12 1876.99 0.34 1.220000e+00
substances (smoke)
substances (coffee)


Conclusions:


The model comparison suggested a negative relationship between WT and the score on the MEQ-r, a positive relationship between WT and the weekly amount of consumed alcoholic units, and later BT during the weekend compared to weekdays.